Cool Blog Classification from Positive and Unlabeled Examples

نویسندگان

  • Kritsada Sriphaew
  • Hiroya Takamura
  • Manabu Okumura
چکیده

We address the problem of cool blog classification using only positive and unlabeled examples. We propose an algorithm, called PUB, that exploits the information of unlabeled data together with the positive examples to predict whether the unseen blogs are cool or not. The algorithm uses the weighting technique to assign a weight to each unlabeled example which is assumed to be negative in the training set, and the bagging technique to obtain several weak classifiers, each of which is learned on a small training set generated by randomly sampling some positive examples and some unlabeled examples, which are assumed to be negative. Each of the weak classifiers must achieve admissible performance measure evaluated based on the whole labeled positive examples or has the best performance measure within iteration limit. The majority voting function on all weak classifiers is employed to predict the class of a test instance. The experimental results show that PUB can correctly predict the classes of unseen blogs where this situation cannot be handled by the traditional learning from positive and negative examples. The results also show that PUB outperforms other algorithms for learning from positive and unlabeled examples in the task of cool blog classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Learning for Blog Classification

Blog classification (e.g., identifying bloggers’ gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning metho...

متن کامل

Text Classification and Co-training from Positive and Unlabeled Examples

In the general framework of semi-supervised learning from labeled and unlabeled data, we consider the specific problem of learning from a pool of positive data, without any negative data but with the help of unlabeled data. We study a naive Bayes algorithm PNB from positive and unlabeled examples. Then, we consider the case where the number of positive examples is quite small, assuming that the...

متن کامل

Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples

Many real-world classification applications fall into the class of positive and unlabeled learning problems. The existing techniques almost all are based on the two-step strategy. This paper proposes a new reliable negative extracting algorithm for step 1. We adopt kNN algorithm to rank the similarity of unlabeled examples from the k nearest positive examples, and set a threshold to label some ...

متن کامل

Ensemble Based Positive Unlabeled Learning for Time Series Classification

Many real-world applications in time series classification fall into the class of positive and unlabeled (PU) learning. Furthermore, in many of these applications, not only are the negative examples absent, the positive examples available for learning can also be rather limited. As such, several PU learning algorithms for time series classification have recently been developed to learn from a s...

متن کامل

Positive and Unlabeled Examples Help Learning

In many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009